On the Identification of Word-Boundaries using Phonological Rules for Speech Recognition and Labeling
نویسندگان
چکیده
In this paper we studied the phonemic structure of the words’ beginnings and endings in standard European Portuguese (hereafter EP). The generativist description of the Portuguese phonology [1] was used as framework basis and the phonetic and acoustic experiments performed by Delgado-Martins [2] served as a model to the phonetic background in EP. We also compared the results between the expected phonological phonemes in word composition (considering the initial and final phonemes) and the phonetic reality in real speech productions affected by dialectal and contextual (also coarticulation) phenomena. To do so, two different corpora were used: one with text output (vide Linguateca) to find grapheme realizations in the beginnings and endings of words; and the other with oral output (vide Teixeira et al. 2001) to have a phonetic perspective of what happens in natural continuous speech production and confirm some of the Portuguese language phonetic tendencies. The most important conclusion obtained from this study is probably the confirmation that the syllabic re-structuring is an ongoing process in modern EP spoken language. In other words, the non-stressed vowel suppression is allowing to form single consonant syllables. Other trends in the distribution of phonemes in word endings and beginnings are also presented and discussed. Our purpose with this work is to contribute with linguistic rule-based information to improve word-boundary segmentation in voice recognition systems. It is well known that the task of word identification is often a difficult task in spontaneous speech, due to the lack of pause boundaries. Therefore, we believe that the stochastic methods usually used in word recognition systems, such as HMM (Hidden Markov Models), Neural Networks and so on, can be strengthened and complemented if some linguistic rules and phonological/ phonetic tendencies are considered.
منابع مشابه
Allophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملVerbal-Auditory Skills in 5-year-Old Children of Semnan/Iran in 2006
Introduction: This research was planned to determine some verbal-auditory skills (verbal-auditory short memory and phonological awareness) that have the closest relationship with speech and language development in 5-year-old children. Method: In this descriptive cross-sectional study, 400 children of pre-school classes affiliated to Education and Welfare organizations in Semnan city were select...
متن کاملImproving recognition performance by modelling pronunciation variation
This paper describes a method for improving the performance of a continuous speech recognizer by modelling pronunciation variation. Although the improvements obtained with this method are small, they are in line with those reported by other authors. A series of experiments was carried out to model pronunciation variation. In the first set of experiments word internal pronunciation variation was...
متن کاملContinuous Speech Recognition at LIMSI
This paper presents some of the recent research on speaker-independent continuous speech recognition at LIMSI including efforts in phone and word recognition for both French and English. Evaluation of an HMMbased phone recognizer on a subset of the BREF corpus, gives a phone accuracy of 67.1% with 35 context-independent phone models and 74.2% with 428 context-dependent phone models. The word ac...
متن کاملWord segmentation in Persian continuous speech using F0 contour
Word segmentation in continuous speech is a complex cognitive process. Previous research on spoken word segmentation has revealed that in fixed-stress languages, listeners use acoustic cues to stress to de-segment speech into words. It has been further assumed that stress in non-final or non-initial position hinders the demarcative function of this prosodic factor. In Persian, stress is retract...
متن کامل